Quantitative Evaluation of Clustering Results Using Computational Negative Controls
نویسندگان
چکیده
Most partition-based cluster analysis methods (e.g., kmeans) will partition any dataset D into k subsets, regardless of the inherent appropriateness of such a partitioning. This paper presents a family of permutation-based procedures to determine both the number of clusters k best supported by the available data and the weight of evidence in support of this clustering. These procedures use one of 37 cluster quality measures to assess the influence of structure-destroying random permutations applied to the original dataset. Results are presented for a collection of simulated datasets for which the correct cluster structure is known unambiguously.
منابع مشابه
Assessment of the Performance of Clustering Algorithms in the Extraction of Similar Trajectories
In recent years, the tremendous and increasing growth of spatial trajectory data and the necessity of processing and extraction of useful information and meaningful patterns have led to the fact that many researchers have been attracted to the field of spatio-temporal trajectory clustering. The process and analysis of these trajectories have resulted in the extraction of useful information whic...
متن کاملAn Empirical Comparison of Distance Measures for Multivariate Time Series Clustering
Multivariate time series (MTS) data are ubiquitous in science and daily life, and how to measure their similarity is a core part of MTS analyzing process. Many of the research efforts in this context have focused on proposing novel similarity measures for the underlying data. However, with the countless techniques to estimate similarity between MTS, this field suffers from a lack of comparative...
متن کاملExperimental Evaluation of Algorithmic Effort Estimation Models using Projects Clustering
One of the most important aspects of software project management is the estimation of cost and time required for running information system. Therefore, software managers try to carry estimation based on behavior, properties, and project restrictions. Software cost estimation refers to the process of development requirement prediction of software system. Various kinds of effort estimation patter...
متن کاملBilateral Weighted Fuzzy C-Means Clustering
Nowadays, the Fuzzy C-Means method has become one of the most popular clustering methods based on minimization of a criterion function. However, the performance of this clustering algorithm may be significantly degraded in the presence of noise. This paper presents a robust clustering algorithm called Bilateral Weighted Fuzzy CMeans (BWFCM). We used a new objective function that uses some k...
متن کاملEvaluation of Groundwater Vulnerability Using Data Mining Technique in Hashtgerd Plain
Groundwater vulnerability assessment would be one of the effective informative methods to provide a basis for determining source of pollution. Vulnerability maps are employed as an important solution in order to handle entrance of pollution into the aquifers. A common way to develop groundwater vulnerability map is DRASTIC. Meanwhile, application of the method is not easy for any aquifer due to...
متن کامل